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AMENDMENTS TO THE SPECIFICATION 
PRIORITY 

This application claims priority to an application entitled "APPARATUS AND 
METHOD FOR PERFORMING MONTGOMERY TYPE MODULAR MULTIPLICATION", 
filed in the Korean Intellectual Property Office on March 14, 2003 and assigned Serial No. 2003- 
16100, the contents of which are hereby incorporated by reference. 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

The present invention relates generally to the field of cryptography, and more particularly to 
an apparatus and method for performing a Montgomery type modular multiplication for use in the 
encryption/decryption on information and digital signature technology. 

2. Description of the Related Art 

In communication systems using smart cards and cyber money for electronic commerce, 
mobile communication devices such as cellular telephones, small-sized computers, etc., it is 
desirable to transport information (electronic text or data) safely by encrypting/decrypting the 
information or conducting a digital signature process for the information. Here, the term "digital 
signature" refers to a technique that "signs" electronic texts with an electronic signature in an 
electronic exchange of information, similar to that done conventionally on paper. With the rapid 
increase of the number of Internet users and the frequent transmission of personal information 
over the Internet, there is a vital need for safe transmission of information through unsecured 
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channels. 

Various proposed algorithms such as RSA (Rivest-Shamir-Adleman), ElGamal, Schnorr, 
etc., have been employed for the encryption/decryption techniques and the digital signature 
technology using a public key system. The RSA algorithm-based ISO (International Standard 
Organization)/IEC (International Electrotechnical Commission) 9796 has been adapted as an 
international standard of these algorithms, DSA (Digital Signature Standard) as a modification of 
ElGamal has been adapted in the U.S.A., GOSSTANDART (commonly abbreviated as "GOST") 
has been adapted in Russia, and KC-DSA has been adapted in Korea. However, various 
communication systems in current use have adapted many PKCSs (Public Key Cryptography 
Standards). The above-mentioned algorithms require operation for modular exponentiation, 
m e modN, which incorporates repetitive performance of modular multiplication, A • BmodN. 

Many algorithms which perform modular exponentiation and modular multiplication 
required to generate-an4-/verify a digital signature based on a public key cipher such as the RSA 
have been proposed, for example, R. L. Rivest et al, "A Method For Obtaining Digital Signatures 
And Public-Key Crytosystems," Communications of the ACM, Vol. 21, pp. 120-126, 1978; P. L. 
Montgomery, "Modular Multiplication Without Trial Division," Math. Of Comp., Vol. 44, No. 
170, pp. 519-521, 1985; S. R. Dusse and B. S. Kaliski Jr., "A Cryptographic Library For The 
Motorola DSP5600," Proc. Eurocrypto'90, pp. 230-244, 199?; and Spronger-Verlag, A. 
Bosselaers, R. Govaerts and J. Vandewalle, "Comparison Of Three Modular Reduction 
Functions," Advances in Cryptology-CRYPTO'93, pp. 175-186, 1993. From the paper by D. R. 
Stinson, "Cryptography", CRC Press, 1995, of these algorithms, the Montgomery algorithm has 
been found to be the most efficient in view of calculation efficiency in modular multiplication 
for modular exponentiation required for various algorithms, but it is not an efficient algorithm 
for simple modular multiplication. U.S. Patent No. 6,185,596 discloses an example of an 
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apparatus implemented by the Montgomery algorithm. 

As mentioned above, many algorithms and architectures have been proposed for the 
public key encryption/decryption and electronic signature. However, since modular 
multiplication apparatuses according to most of the proposed algorithms and architectures are 
designed for high-speed public key encryption/decryption, they have a disadvantage in that a 
great number of gates are required and a large amount of power is consumed. Therefore, they 
are not suitable for a resource-limited environment like in a smart card. 

SUMMARY OF THE INVENTION 

Therefore, the present invention has been made in view of the above problems, and it is 
an object of the present invention to provide a modular multiplication apparatus and method with 
few e r gates for high-speed encryption/decryption and electronic signature in a mobile 
communication environment including smart cards and mobile terminals. 

It is another object of the present invention to provide a modular multiplication apparatus 
and method with r e duc e d pow e r consumption fewer gates for high-speed encryption/decryption 
and electronic signature in a mobile communication environment including smart cards and 
mobile terminals. 

It is still another object of the present invention to provide a modular multiplication 
apparatus and method , which enables high sp e ed encryption/decryption and electronic signature 
with reduced power in a mobile communication environment including smart cards and mobile 
terminals. 
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To achi e ve th e abov e obj e ct s , th e modular multiplication apparatus for impl e m e nting an 
information e ncryption/d e cryption t e chniqu e in which a mes s ag e (A) is e ncrypt e d/d e crypt e d using a 
first k e y (B) and a s e cond k e y (N), compris es a storag e having s e parat e r e gions which stor e th e 
m e ssag e , th e first k e y, and th e second k e y, e ach of n bits in l e ngth; a r e cording logic which g e n e rat e s 
a first n-Hbit signal using th e m e ssag e and th e first key at e ach clock pul se ; a first carry sav e add e r 
which g e nerates a 3bit s e qu e nc e consisting of on e carry valu e and two sum valu e s using the first 
n+4bit signal and two parall e l n+4bit input signals; a quoti e nt logic which g e n e rates a 3bit 
determin e r for d e t e rmining a modular reduction multipl e using th e 3bit s e qu e nc e and on e carry 
valu e ; a se l e ctor which g e n e rat e s a se cond n+ 4 bit signal using th e s e cond k e y and th e 3bit 
det e rmin e r; a s e cond carry sav e add e r which outputs a pair of sum and carry valu e s using th e second 
n+ 4 bit signal, and r e sp e ctiv e sum and carry t e rms output from th e first carry add e r; a first full adder 
which outputs a carry input valu e by p e rforming a full addition operation with th e pair of sum and 
carry valu es and a carry valu e outputt e d from th e quoti e nt logic at a pr e vious clock. Th e s e parat e 
r e gions ar e s hift r e gist e rs for the r e sp e ctiv e m e ssag e , first k e y, and s e cond k e y. The m e ssage i s right 
shifted by 2 bit positions at e v e ry clock. Th e first n+ 4 bit signal is on e of 0, B, 2B, B, and - 2B. The 
second n+ 4- bit signal is one of 0, N, 2N, N, and 2N. 

The r e cording logic compris e s a booth r e cording circuit for p e rforming booth r e coding with 
two low ord e r bits of m e ssag e ; a multipl e x e r for multipl e xing th e two low ord e r bits and th e fir s t k e y 
so as to output on e of 0, B, and 2B; and a on e 's compl e m e ntary op e rator for s e l e ctiv e ly p e rforming 
on e 's compl e m e ntary op e ration on th e n+lbit signal output from th e multipl e x e r according to th e 
two low order bits so as to gen e rat e on e of 0, B, 2B, B, and 2B. 

Th e first carry save adder comprises n+4 second full add e rs, e ach p e rforming full addition 
op e ration with corr e sponding sum and carry bits of th e two parall e l n+ 4 input signals and 
corr e sponding bit of th e first n+ 4 bit signal so as to produc e the 3 bit s e qu e nc e s. A first one of th e 
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two parall e l n+4 bit input signals is cr e ated by s e l e cting high ord e r n+2 bits from a sum t e rm of th e 
s e cond carry sav e add e r and ins e rting 2 bit s a s high e r ord e r bits of th e se l e ct e d n+2 bits and th e two 
high e r ord e r bits are z e ros. A s e cond on e of th e two parall e l n+4bit input s ignals is cr e at e d by 
s e l e cting high e r n+3 bits from a carry t e rm of th e second carry sav e add e r and inserting on e bit as 
high e r ord e r bit of th e sel e ct e d n+3 bit s and th e on e high e r ord e r bit is z e ro. 

Th e quoti e nt logic compris e s a D flip - flop for t e mporally storing th e carry input valu e from 
the fir s t full adder; a third full add e r for p e rforming full addition operation on th e carry input valu e 
and a sum valu e outputt e d from a l e a s t significant full add e r of the first carry sav e add e r in 
consideration of a sign of th e first k e y; an e xclusiv e OR (XOR) logic gat e for p e rforming e xclusiv e 
OR op e ration on th e carry valu e outputt e d from th e l e ast significant full add e r of th e first carry sav e 
add e r, a sum valu e outputted from a s e condly l e ast significant full add e r of th e first carry sav e add e r, 
and th e carry valu e outputt e d from the quoti e nt logic at a pr e vious clock; and a combinational circuit 
for combining th e outputs of th e third full adder and e xclusiv e OR logic gat e and a s e cond l e a s t 
significant bit (nl) of th e s e cond k e y so as to output th e 3 bit d e t e rmin e r signal 

Th e s e cond carry sav e add e r compris e s n+ 4 fourth full add e rs, e ach p e rforming full addition 
operation with corr e sponding s um and carry bits from th e first carry sav e add e r e xc e pt for th e l e ast 
significant sum bit and th e most significant carry bit and a corr e sponding bit of th e s e cond n+ 4 - bit 
signal so as to produc e th e pairs of s um and carry bits. Th e first full add e r p e rforms full addition 
with a sum value output from a s e cond least significant full adder of th e s e cond carry sav e add e r, a 
carry valu e outputt e d from a l e ast significant full add e r of the s e cond carry sav e add e r, and carry 
valu e (cin) of th e quotient logic at a pr e vious clock so a s to produc e th e carry input valu e . 

Th e modular multiplication apparatus furth e r compris e s a carry propagation add e r for 
p e rforming a carry propagation addition op e ration with th e sum and carry t e rms outputt e d from th e 
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s e cond carry sav e adder aft e r m+2 clock, wh e r e m-n/2. Th e carry propagation add e r adds modulu s 
s e cond k e y to a result of th e carry propagation addition operation if an output of th e carry 
propagation add e r is n e gativ e valu e . 

In anoth e r asp e ct of the pr e s e nt inv e ntion, th e modular multiplication d e vic e for 

implem e nting a m e ssage e ncryption/d e cryption t e chniqu e in which th e m e ssag e (A) is 
e ncrypt e d/d e crypt e d using a first k e y (B) and a s e cond k e y (N), compris e s: a storag e having separat e 
r e gions for storing th e m e ssag e , th e first k e y, and th e s e cond k e y of n bits; a recording logic for 
g e n e rating a first n+3 bit signal using th e m e ssag e and first k e y at e ach clock; a first carry sav e 
add e r for outputting a 3bit s e qu e nc e consisting of on e carry valu e and two sum valu e s by performing 
a fir s t carry sav e addition op e ration with th e - bit s ignal and two parall e l n+3 bit input signals; a 
quoti e nt logic for g e n e rating a 2bit d e t e rmin e r for d e t e rmining a modular r e duction multipl e by 
p e rforming a quoti e nt op e ration w r ith th e 3bit s e qu e nc e and on e carry valu e ; a s e l e ctor for g e n e rating 
a second n+3bit signal using th e se cond k e y and th e 2bit d e t e rminer; a second carry sav e add e r for 
outputting a pair of sum and carry valu e s by p e rforming a s e cond carry sav e addition op e ration with 
th e s e cond n+3bit signal, and r e sp e ctiv e sum and carry t e rms output from th e first carry addition 
op e ration; an AND logic gat e for outputting a carry input valu e by p e rforming an AND op e ration 
with th e pair of sum and carry valu e s. Th e s e parat e r e gions ar e shift regist e rs for th e r e sp e ctive 
m e ssag e , first k e y, and s e cond k e y. Th e m e ssag e is shift e d by 2 positions at e v e ry clock. The first 
n+3bit signal is on e of 0, B, 2B, and 3B. The se cond n+3bit signal is on e of 0, N, 2N, and 3N. 

Th e r e cording logic is a multipl e x e r which multipl e x e s two low e r bits of m e ssag e and n bits 
of first key so as to output th e first n+3 bit signal Th e first carry sav e add e r comprises n+3 first full 
add e rs, e ach performing full addition operation with corresponding sum and carry bits of th e two 
parall e l n+3bit input signals and corr e sponding bit of th e first n+3bit signal so as to produce th e 3bit 
sequenc e . A first on e of the two parall e l n+3bit input signals is cr e at e d by s e l e cting high ord e r n+1 
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bits from a s um t e rm of th e se cond carry sav e add e r and ins e rting two bits as high e r ord e r bits of th e 
s e l e ct e d n+1 bits and th e two high e r ord e r bits ar e z e ros. A s e cond on e of th e two parall e l n+3bit 
input signals is cr e at e d by s e l e cting high e r ord e r n+2 bits from a carry t e rm of th e se cond carry sav e 
adder and ins e rting 1 bit as higher order bit of th e s e l e cted n+2 bits and th e on e higher order bits is 
zerer 

Th e quotient logic compris e s a D flip flop for t e mporally s toring th e carry input valu e from 
th e AND logic; a half add e r for p e rforming half addition op e ration with th e carry input valu e and a 
sum valu e outputted from th e a significant fuller add e r of th e first carry sav e add e r; an e xclusiv e OR 
(XOR) logic gat e for p e rforming an e xclusiv e OR op e ration with a carry valu e outputt e d from th e 
l e ast significant full add e r of th e first carry sav e add e r, a s um valu e outputt e d from a se cond l e a s t 
significant full adder, and an output of th e half add e r; a combinational circuit for combining th e 
outputs of th e half add e r and th e exclusiv e OR logic and a s e cond l e ast significant bit (nl) of th e 
s e cond k e y so as to output th e 2 - bit det e rmin e r signal 

The s e cond carry sav e add e r compris e s n+3 s e cond full add e rs, e ach p e rforming full 
addition op e ration with corr e sponding sum and carry bits from th e first carry saver add e r e xc e pt for 
a l e ast significant sum bit and th e most significant carry bit and a corr e sponding bit of th e s e cond 
n+3bit signal so as to produc e th e pairs of s um and carry bits. Th e AND logic p e rforms AND 
op e ration with a sum valu e outputt e d from a s e cond l e ast significant s e cond full add e r of th e s e cond 
carry sav e add e r and a carry valu e outputt e d from a l e ast significant s e cond full adder of th e s e cond 
carry sav e add e r so as to produc e th e carry input valu e . The modular multiplication apparatus furth e r 
compris e s a carry propagation add e r for p e rforming carry propagation addition operation with th e 
sum and carry t e rms outputt e d from th e second carry sav e add e r aft e r m+2 clock. 

In anoth e r aspect of the present invention, the modular multiplication method for 
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impl e m e nting a m e ssag e encryption/decryption t e chniqu e in which a m e ssag e (A) is 
e ncrypt e d/d e crypt e d using a first k e y (B) and a s e cond k e y (N), compris e s storing th e m es sag e , first 
key, and s e cond k e y of n bits in resp e ctiv e storag e s; gen e rating a first n+4bit signal using th e 
m e ssag e and first k e y at e ach clock; outputting a 3bit s e qu e nc e consisting of on e carry value and two 
s um values by performing a first carry sav e addition op e ration with th e first n+ 4 bit signal and two 
parallel n+4bit input signals; g e n e rating a 3bit det e rmin e r for d e termining a modular r e duction 
multipl e by p e rforming a quoti e nt op e ration with the 3bit s e qu e nc e and on e input carry valu e ; 
generating a s e cond n+ 4 bit signal using th e s e cond k e y and th e 3bit d e t e rmin e r; outputting a pair of 
sum and carry valu e s by p e rforming a se cond carry s av e addition op e ration with the second n+ 4 bit 
signal, and r e sp e ctiv e sum and carry t e rms output from the first carry s av e addition op e ration; 
outputting a carry input value by p e rforming a full addition op e ration with the pair of sum and carry 
valu e s and carry valu e outputt e d from th e quoti e nt logic at a pr e vious clock. Th e m e ssag e is right 
shift e d by 2 bits at ev e ry clock. 

The procedure of generating the first n+1bit signal includ e s performing booth r e cording 
with two low ord e r bits of th e m e ssag e ; and g e n e rating on e of 0, B, 2B, B, and 2B according to th e 
two low order bits. A first on e of th e two parall e l n+ 4 ■ bit input signals is cr e at e d by s e l e cting high 
ord e r n+2 bits from a sum t e rm output by th e s e cond carry sav e addition op e ration and ins e rting 2 
bits as high e r ord e r bits of th e s e l e ct e d n+2 bits and th e two high e r ord e r bits are zeros. A s e cond 
on e of th e two parall e l n+ 4 input signals is cr e at e d by s e l e cting higher n+3 bits from a carry t e rm of 
th e s e cond carry sav e addition op e ration and ins e rting on e bit as a high e r ord e r bit of th e s e l e ct e d 
n+3 bits and th e one high e r ord e r bit is z e ro. Th e 3bit s e qu e nc e includ e s two sum valu e s and on e 
carry value. Th e two sum valu e s ar e a l e ast significant bit and a s e cond l e ast significant bit of a sum 
term outputt e d from th e first carry sav e addition op e ration and th e on e input carry valu e is a l e ast 
significant bit of a carry term outputt e d from th e first carry sav e addition op e ration. Th e one input 
carry valu e is th e carry input valu e g e nerat e d by th e full addition operation. The second n+4bit 
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signal is s e lect e d from among 0, N, 2N, -N, and 2N according to two low ord e r bits of th e 3bit 
d e t e rmin e r. Th e pair of s um and carry valu e s ar e a s e cond l e ast significant bit of a sum t e rm and a 
l e ast significant bit of th e carry t e rm output from th e se cond carry sav e addition op e ration. Most 
significant bit s of th e sum and carry t e rms outputt e d from th e first carry sav e addition op e ration ar e 
ignor e d. Th e modular multiplication m e thod furth e r compris e s p e rforming a carry propagation 
addition operation with the sum and carry t e rms aft e r m+2 clock, wh e r e m~n/2. Th e modular 
multiplication m e thod further compris e s adding modulus s e cond k e y if an output of th e carry 
propagation addition op e ration is a n e gativ e valu e . 

In still anoth e r asp e ct of th e pr e s e nt inv e ntion, a modular multiplication m e thod for 

impl e m e nting a m e ssag e e ncryption/d e cryption t e chniqu e in which m e ssag e (A) is 
encrypt e d/decrypt e d using a first k e y (B) and a s e cond key (N), comprises storing th e m e ssag e , 
first k e y, and s e cond k e y of n bits in r e sp e ctive storag e s; g e n e rating a first n+3bit signal using 
th e m e ssag e and first k e y at e ach clock; outputting a 3bit s e qu e nc e consisting of on e carry valu e 
and two s um valu e s by p e rforming a first carry sav e addition op e ration with th e first n+3bit 
signal and two parall e l n+3bit input signals; gen e rating a 2bit determiner for d e t e rmining a 
modular r e duction multipl e by p e rforming a quotient op e ration with th e 3bit s e qu e nc e and on e 
input carry valu e ; gen e rating a s e cond n+3bit signal using th e s e cond k e y and th e 2bit 
det e rmin e r; outputting a pair of sum and carry values by p e rforming a s e cond carry sav e addition 
op e ration with th e s e cond n+3bit signal, and r e sp e ctiv e sum and carry t e rms outputt e d from th e 
first carry addition op e ration; outputting a carry input valu e by p e rforming an AND op e ration 
with th e pair of s um and carry valu e s. Th e m e ssag e is right shift e d by 2 bits at e v e ry clock. Th e 
first n+3 bit signal is produced by multipl e xing two low ord e r bits of the m e ssage and th e first 
k e y. The first n+3bit signal is on e of 0, B, 2B, and 3B. A first on e of th e two parall e l n+3bit 
input signals is cr e at e d by s e l e cting high ord e r n+1 bits from a sum t e rm of th e second carry sav e 
addition op e ration and ins e rting two bits as high e r ord e r bits of th e selected n+1 bits and th e two 
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high e r ord e r bits ar e z e ros, A s e cond one of th e two parall e l n+3bit input signals is cr e at e d by 
s e l e cting high e r ord e r n+2 bits from a carry t e rm of th e s e cond carry sav e addition op e ration and 
inserting 1 bit as high e r ord e r bit of th e s e l e ct e d n+2 bits and th e on e high e r ord e r bit is z e ro. 

Th e 3bit s e qu e nc e includ e s two sum valu e s and on e carry valu e . Th e two s um valu e s ar e 
a l e ast significant bit and a se condly l e ast s ignificant bit of a sum t e rm output from th e first carry 
sav e addition op e ration and th e on e carry valu e is a l e a s t s ignificant bit of a carry t e rm output 
from th e first carry sav e addition op e ration. Th e on e input carry valu e is the carry input valu e 
g e nerat e d by th e AND op e ration. Th e s e cond n+3bit signal is s e l e ct e d from among 0, N, 2N, and 
3N according to 2bit det e rmin e r. The pair of sum and carry valu e s ar e a se cond l e ast s ignificant 
bit of a sum t e rm and a l e ast significant bit of th e carry t e rm outputt e d from th e s e cond carry 
sav e addition op e ration. Th e most significant bits of th e sum and carry t e rms outputt e d from th e 
first carry sav e addition op e ration ar e ignor e d. 

Th e modular multiplication m e thod furth e r comprises p e rforming a carry propagation 
addition op e ration with sum and carry t e rms outputt e d from th e s e cond carry sav e addition 
op e ration aft e r m+2 clock. 

According to one aspect of the present invention, there is provided a signal processing 
apparatus for performing modular multiplication for use in a signal processing system. The 
apparatus includes a first logic for outputting a signed multiplicand bv selectively performing a 
one's complementary operation on a multiplicand according to a Booth conversion result of a 
multiplier in modular multiplication; a second logic for outputting a modulus which is signed in the 
modular multiplication based on a carry input value Carry-in of a current clock, determined from a 
carry value cin for correction of a previous clock, and on a sign bit of the multiplicand: and a third 
logic for receiving the signed multiplicand and the signed modulus, and calculating a result value of 
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the modular multiplication by iteratively performing a full addition operation on a carry value C and 
a sum value S of the full addition operation, found at the previous clock. 

According to another aspect of the present invention, there is provided a signal 
processing method for performing modular multiplication for use in a signal processing system. 
The method includes outputting a signed multiplicand by selectively performing a one's 
complementary operation on a multiplicand according to a Booth conversion result of a 
multiplier in modular multiplication; finding a carry input value Carry-in of a current clock 
determined from a carry value cin for correction of a previous clock: outputting a modulus which 
is signed in the modular multiplication based on the carry input value and a sign bit of the 
multiplicand: and receiving the signed multiplicand and the signed modulus, and calculating a 
result value of the modular multiplication by iteratively performing a full addition operation on a 
carry value C and a sum value S of the full addition operation, found at the previous clock. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above and other objects, features and other advantages of the present invention will be 
more clearly understood from the following detailed description taken in conjunction with the 
accompanying drawings, in which: 

Fig. 1 is a block diagram showing a configuration of a modular multiplication apparatus in 
accordance with a first embodiment of the present invention; 

Fig. 2 is a block diagram showing a detailed configuration of a recording conversion circuit 
shown in Fig. 1; 

Fig. 3 is a block diagram showing a detailed configuration of the first carry save adder shown 
in Fig. 1; 

Fig. 4 is a block diagram showing a detailed configuration of the quotient logic shown in Fig. 
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i; 

Fig. 5 is a block diagram showing a detailed configuration of the second carry save adder 
shown in Fig. 1; 

Fig. 6 is a block diagram showing a detailed configuration of the full adder shown in Fig. 1 ; 

Fig. 7 is a block diagram showing a configuration of a modular multiplication apparatus in 
accordance with a second embodiment of the present invention; 

Fig. 8 is a block diagram showing a detailed configuration of a recording conversion circuit 
shown in Fig. 7; 

Fig. 9 is a block diagram showing a detailed configuration of the first carry save adder shown 
in Fig. 7; 

Fig. 10 is a block diagram showing a detailed configuration of the quotient logic shown in 

Fig. 7; 

Fig. 1 1 is a block diagram showing a detailed configuration of the second carry save adder 
shown in Fig. 7; 

Fig. 12 is a diagram showing a detailed configuration of the full adder shown in Fig. 7; and 
Fig. 13 is a block diagram showing an example of application of the modular multiplication 
apparatuses in accordance with the embodiments of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Preferred embodiments of the present invention will now be described in detail with 
reference to the annexed drawings. In the drawings, the same or similar elements are denoted by 
the same reference numerals even though they are depicted in different drawings. In the 
following description, a detailed description of known functions and configurations incorporated 
herein will be omitted when it may obscure the subject matter of the present invention. 
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A. Outline of the Invention 

In the following description, the present invention discloses an apparatus and method for 
performing a modular multiplication, A • BmodN, by using a Montgomery algorithm, where 
A = a n -i • 2 n_1 + • • • +a, • 2 + ao, 
B - b n -i • 2 n_1 + • • • +bi • 2 + b 0 , and 
N = n n .i • 2 n_1 + • • • +n, • 2 + no. 

Here, A is a multiplier, B is a multiplicand, and N is a modulo numbe r modulus , a bit 
size of each of which can be a large number, for example, 512 or 1024. 

The modular multiplication, A • BmodN, is implemented by two embodiments, which will be 
described. Each embodiment suggests a modular multiplication apparatus and method for 
calculating A ■ B - R-lmodN in m+2 clocks with A, B and N (where R=4 m+2 , m=n/2, -N<A, and 
B<N), each being n bits in length, being received as inputs. A • BmodN can be calculated by using a 
multiplication result by the suggested modular multiplication apparatus. The modular 
exponentiation, m e modN, which is required to perform RSA operation, can be derived from the 
calculated A BmodN. Figs. 1 to 6 of the drawings are block diagrams showing the configuration of 
the elements of the modular multiplication apparatus in accordance with a first embodiment of the 
present invention, and Figs. 7 to 13 are block diagrams showing the configuration of the elements of 
the modular multiplication apparatus in accordance with a second embodiment of the present 
invention. Fig. 14 is a block diagram of an IC card to which the modular multiplication apparatuses 
in accordance with the embodiments of the present invention are applicable. 

Embodiments of the present invention provide modular multiplication apparatuses which bits 
of the multiplier are sequentially shifted to generate a shiftedconyerted bit strings and a partial sum is 
calculated by expressing it as a one's complementary number according to a value two low e r bits of 
the gen e rated converted b it string are Booth r e cord e d . In contrast with conventional modular 
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multiplication apparatuses wherein only a single lower bit generated by sequentially shifting bits of 
the multiplier is r e cord e d processed , the present invention allows the multiplication to be performed 
at higher speeds by processing bits in a mann e r wh e r e two lower bits ar e r e cord e d of the multiplier . 
The modular multiplication apparatuses in accordance with the embodiments of the present 
invention include modified r e cording conversion logics and other elements configured in compliance 
with the modified r e cording conversion logics for performing the modular multiplication operation 
according to the Montgomery algorithm. 

B. First Embodiment 

B-l. Configuration of the Invention 

Fig. 1 is a block diagram showing a configuration of a modular multiplication apparatus in 
accordance with the first embodiment of the present invention. 

Referring to Fig. 1 , the modular multiplication apparatus includes r e cording conversion logic 
1 10, a first carry save adder (hereinafter, abbreviated as "CSA1") 120, a quotient logic 130, selector 
140, a second CSA ("CSA2") 150, and a full adder (FA) 160. The modular multiplication apparatus 
is a hardware device for calculating A • B • R^modN in m+2 clocks with A, B and N (where 
R=4 m+2 , m=n/2, -N<A, and B<N), each having n input bits according to a Montgomery algorithm. 
The modular multiplication apparatus calculates A • B • 2" (n+4) modN' Herein, A is called a 
multiplier, B is called a multiplicand, and N is called a modulus. 

Each of the CSAs 120 and 150 is composed of (n+4) full adders in parallel, each of which 
has a 3 bit input and outputs a carry bit and a sum bit. The r e cording conversion logic 1 10 performs 
a modified Booth r e cording op e ration converted value and its corresponding selective one's 
complementary function b ased on the multiplier A and outputs one of the values 0, ±B, and ±2B as a 
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signed extension bit of the (n+4) bits. The quotient logic 130 has as its inputs a least significant bit 
(LSB) carry value Cj,o and two sum LSB bits Si,i and Si,o from the CSA1 120, a carry input value 
'carry-in ' output from the full adder 160 , and a sign bit (B sign of FIG. 4) of B, and outputs q2qiqo of 
3 bits, which is a value for determining a multiple of the modular reduction. The selector 140, which 
can be implemented by multiplexers (MUXs), selects and outputs one of 0, sfaN, and ±2N N , 2N and - 
N(see FIG. 1) based on a determined value of q (i.e. q^ qiq o of 3 bits) . The full adder 160 performs 
full add operation, with two bits S2J and C2,o output from the CSA2 150 and a carry value cin for 
correction of the current clock as its inputs, and provides a result value of the full add to the quoti e nt 
logic 130 next clock as a carry-in signal to be used in the quotient logic 130 . 

Although not shown in detail in Fig. 1, it should b e not e d that the modular multiplication 
apparatus includes temporary storing registers C and Rfor storing carry values and sum values, 
which are the outputs of the CSA1 120 and CSA2 150, respectively, for each clock, and a carry 
propagation adder for adding values stored in the temporary storing registers CandR and outputting 
a resultant value as a result of modular multiplication. 

Fig. 2 is a block diagram showing a detailed configuration of the r e cording conversion logic 
110 shown in Fig. 1. 

Referring to Fig. 2, the r e cording conversion logic 110 Booth- r e cords c onverts two lesser 
bits (at+uar) of the multiplier A and a reference bit a u , with bits of the multiplier A being 
sequentially shifted of a bit string g e n e rated by sequ e ntially shifting bits of the multipli e r A , 
multiplexes the multiplicand B according to a -the Booth-converted result value Zi+i of th e Booth 
recording with th e multiplicand B , and outputs signed binary numbers of (n+4) bits. In FIG. K the 
Booth-converted result value z ^m is shown separately for individual output bits, for example, shown 
as Zi ±1 [21, z^flL Zj jj JO]. Herein, the multiplicand B is multiplexed according to z m fl] and z^rOI, 
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and an output of a multiplexer (MUX) 1 14 is selectively signed according to z x + ± [2] being input to a 
one's complementer 116. Therefore, the Z i+j J2] will be called a sign bit. For this purpose, a shift 
register 102 for sequentially shifting bits of the multiplier Ato generate a shifted bit string and a 
register 104 for storing the multiplicand are provided at the front stage of the r e cording conversion 
logic 1 10. The r e cording conversion logic 1 10 also includes a Booth r e cording conversion circuit 
1 12, a multiplexer (MUX) 1 14, and a one's complementer 116. The Booth recording conversion 
circuit 112 Booth- r e cords converts two lesser bits aj+i and ai of the generated bit strin g and a 
reference bit a u , and outputs, for example, a 3-bit result value (z i+i J2"|, Zj +j jll, Zj +j JO] : zm) . The 
multiplexer 1 14 multiplexes the multiplicand B according to t he result Zj+i of the Booth r e cording 
conversion with th e multiplicand , and outputs 0, B and 2B as a result of multiplexing. The one's 
complementer 1 16 performs a one's complement operation on the output of the multiplexer 1 14 that 
receives according to the two lesser bits of the generated bit string, and outputs signed binary 
numbers of the (n+4) bits. The r e cording conversion logic 1 10, which is a circuit for implementing a 
modified Booth r e cording conversion based on the multiplier A, outputs a signed extension bit of 
(n+4) bits, which is one of the values 0, ±B, and ±2B. 

Fig. 3 is a block diagram showing a detailed configuration of the CSA1 120 shown in Fig. 1. 

Referring to Fig. 3, the CSA1 120 having (n+4) full adders 121 to 125 has as its inputs first 
signals S2,2 to S2, n +3 of (n+2) bits, second signals C2J to C2, n +3 °f (n+3) bits, and third signals B 0 to 
B n +3 being the binary numbers of (n+4) bits from the r e cording conversion logic 1 10, and full-adds 
the inputs by means of the (n+4) full adders 121 to 125 to output carry values C^o to C| >n +3 and sum 
values S i,o to Si, n +3 of (n+4) bits. Here, an (n+2)th higher bit S2, n +3 of the first signals is input to the 
three higher full adders 123 to 125, and an (n+3)th higher bit C2, n +3 of the second signals is input to 
two the higher full adders 124 and 125. 
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Fig. 4 is a block diagram showing a detailed configuration of the quotient logic 130 shown in 

Fig. 1. 

Referring to Fig. 4, the quotient logic 130 has as its inputs sum values Si,o and Sij output 
from the two lower full adders and a carry value Ci,o output from lowest full adder, which are 
selected from the carry values and sum values of (n+4) bits from the CSA1 120, and outputs a 
determination value q2qiqo of 3 bits to determine a multiple of modular reduction. The quotient 
logic 130 consists of a D flip flop 132, a full adder 134, an exclusive OR (XOR) logic gate 136, and 
a combinational circuit 138. The D flip flop 132 temporarily stores a carry input value, Carry-in, 
provided from the FA 160 of FIG. 1 . The full adder 134 full-adds the carry input value Carry-in 
stored in the D flip flop 132 and the sum value Si,o output from the least significant bit full adder 
121 of the CSA1 120. The exclusive OR logic 136 performs an exclusive Or operation between the 
carry value Ci,o output from the least significant bit full adder 121 of the CSA1 120 and the sum 
value S u output from a second full adder 122 oftheCSAl 120 . Each of tj he full adder 134 and th e 
e xclusiv e OR logic 136 is provid e d with generates a preset carry value cin for correction, and the 
full add e r 134 is ake-provided with a sign bit B sign of the multiplican d B, though not shown in 
FIG. 1 . In the present invention, the carry value cin for correction, output from the full adder 134 at 
the current clock, is input to the full adder 1 60 of FIG. 1 as described above, and the fuller adder 1 60 
determines a carry input value Carry-in to be used in the quotient logic 130 at the next clock based 
on the carry value cin for correction. The combinational circuit 138 combines the output So from the 
full adder 134, the outputSi from the exclusive ORlogic 136, andapreset input bit nl, and outputs 
the determination value q2qiqo of 3 bits , which is a value for determining a multiple of modular 
reduction . 

Fig. 5 is a block diagram showing a detailed configuration of the CSA2 1 50 shown in Fig. 1 . 
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Referring to Fig. 5, the CSA2 150 includes (n+4) full adders 151 to 156. The CSA2 150 
includes modulo numb e rs the modulus N (N 0 - N n +3) of (n+4) bits selected from the selector 140 as a 
first input signal, and remaining carry values C|,o to Ci,„+3 of (n+3) bits, except a most significant bit 
carry value of the carry values of (n+4) bits, from the CSA1 120 as a second input signal, and 
remaining sum values Sjj to Si, n +3 of (n+3) bits, except a least significant bit eafFV -sum value of the 
sum values of (n+4) bits, from the CSA1 120 as a third input signal to output carry values C 2 ,o to 
C2, n +3 of (n+4) bits and sum values S2,o to S2, n +3 of (n+4) bits by means of the (n+4) full adders 151 
to 1 56. Asfor Tthe (n+4) bits of the first input signal , the (n+4) bits of the modulus N (No~N n +Q are 
s e qu e ntially input individually , starting from a l e ast s ignificant bit full add e r 1 5 1 , to r e sp e ctiv e full 
add e rs 151 to 156 , carry values of t he (n+3) bits of the second input signal are sequentially input, 
starting from a second lower full adder 152 , to r e spectiv e full add e rs 152 to 156 among the full 
adders , and sum values S ■y-S of the (n+3) bits of the third input signal are sequentially input; 
starting from the second lower full adder 152 , to r e sp e ctiv e among the full adders 152 to 156. The 
least significant bit full adder 151 of the full adders 151 to 156 is input with the output So from the 
full adder 134 of the quotient logic 130, a second output bit q, >2 of the combinational circuit 138 , and 
a least significant bit No of the modulo numbers N. 

Fig. 6 is a block diagram showing a detailed configuration of the full adder 160 shown in Fig. 

1. 

Referring to Fig. 6, the full adder 160 full-adds a carry value C2,o output from the least 
significant bit full adder 151 of the CSA2 150 and a sum value S3-0 S2,i output from the second 
lowest full adder 152 to output a carry input value Carry-in. The full adder 160 is also provided with 
a carry value cin for correction preset for full add operation and outputs the carry input value Carry- 
in as a result of the full add operation. The carry input value Carry-in is provided to the quotient 
logic 130. 
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B-2. Principle of the Invention 

The present invention provides a device for calculating A • B • R" 1 modNinm+2 clocks with 
A, B and N (where R=4 m+2 , m=n/2, -N<A, and B<N), each having n bits as its inputs. Three 
principles that are applicable to the implementation of the present invention will be described. The 
three principles include a first principle of representation of the multiplier A and the multiplicand B 
for modular multiplication, a second principle of r e cording calculating a one's complement-based 
partial product using 2 bits of the multiplier A for modular multiplication, and a third principle of 
using the Booth conversion and the one's complement-based partial product of the present invention 
th e Montgom e ry algorithm using the principl e of r e cording of th e pr e s e nt inv e ntion . 

B-2. a. Number Representation 

In the present invention, the multiplier A and the multiplicand B are represented by signed 
binary numbers for the modular multiplication. A and B, each having n bits, are respectively 
transformed to (n+4) bits for signed operation. During this transformation, any negative values are 
transformed to their one's complement. 

B-2.b. Booth's Conversion R e cording 

The present invention employs a modified Booth r e cording conversion system, which is a 
modification of the Booth r e cording conversion system well known to those skilled in the art to 
which the invention pertains. The present invention increased the speed of the modular 
multiplication. The multiplier A is record e d as converted into 2 bit Zi (where 0<i<m+l) by means of 
the modified Booth r e cording conversion system. Here, it is assumed that a n +4 " a n +3, a_i = 0. The 
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following Table 1 shows a rule of the modified booth r e cording conversion according to the present 
invention. In addition, as shown in FIG. 2, the multiplicand 0, B or 2B is output via the multiplexer 
114 according to the two bit values Zj +j jl"), Zj +i fOI. As shown in FIG. 2, to find a signed partial 
product -B or -2B, a one's complementary operation is selectively performed on the output of the 
multiplexer 1 14 based on the sign bit Z i ± 1 [2] in the Booth-converted value. 



Table 1 
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In Table K the Booth-converted result value z i+i is expressed in a signed decimal number 
with three bits of Z j-n [2"|, z ,+i j 1 \ Z i+j JO], and z m can be expressed as a binary number in the brackets. 

B-2.C. Radix-4 Montgomery Algorithm using Booth's Recording 

The algorithm illustrated in the following Equation 1 shows that the present invention 
employs the modified Booth r e cording conversion system and the one's complementary operation 
for radix-4 Montgomery modular multiplication. An original Montgomery algorithm compares a 
result value with a modulus N, and performs a subtraction operation if the result value is greater than 
the modulus N. However, the following algorithm of the present invention does not show such a 
comparison and subtraction operation of the original Montgomery algorithm. 



21 



PATENT 

Attorney Docket: 678-1395 (P10801) 



Equation 1 

Input: N, -N<A,B<N 

Output: S = A • B • 4" m " 2 modN, -N<S<N 



S = 0 (1) 
fori = 0to(n+l)/2 (2) 
S = S + A x B (3) 
q«2, i ,o> = f(s i ,so, n i ,n 0 ) (4) 
S = S + qi x N (5) 

S = S3 2 (6) 

endfor (7) 



In the algorithm of Equation 1, A x B Aj-in procedure (3) refers to two Booth- r e cord e d 
converted bits and has a valu e of 2< A <2. Procedure (4) refers to a function that causes two least 
significant bits of the result values in procedure (5) to be '0'. Result values in procedure (4) depend 
on input bits s i , so, n i , and no and are determined as shown in the following T-able Equation 2. qi2, the 
most significant bit (MSB) of a value q used for modular reduction, is a sign bit. The remaining two 
bits g nqm areq, is on e of selected from among elements {0, ±1, 2} aad-qtis calculated according to 
the following Equation Table 2. 

Equation 2 

qo = s 0 
<li = s 0 s l 

q 2 =s 0 s x n y +s 0 s x ti x 
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Table 2 
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B-3. Operation of the invention 



The apparatus of the present invention as shown in Fig. 1 calculates A • B • R" modN in m+2 
clocks with A, B and N (where R=4 m+2 , m=n/2, -N<A, and B<N), each having n bits as its inputs. 

A procedure for calculating A • B • R _1 modN (where, R=4 m+2 ) by the apparatus shown in Fig. 
1 will now be described. In the following description, step a) is an initialization step, steps b) to h) 
are steps to be performed every clock, and step i) is a step to be performed after steps b) to h) are 
performed during (m+2) clocks. 

a) A, B, N, each having n bits, input for modular multiplication, are stored in respective 
registers (or memories). Although the apparatus of the present invention is shown to store the inputs 
A and B in respective registers 102 and 104 without showing a separate register in which N is 
stored, it is apparent to those skilled in the art that such a separate register is used in the apparatus of 
the present invention. Here, the register 102 in which A is stored is a shift register in which A is 
shifted to the right side by two bits for each clock. For convenience's sake, the register in which A 
is stored is indicated as register A and the register in which B is stored is indicated as register B. 
With respect to the memory, A and B are read out one word at a time. Temporary registers (or 
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memories) C and S (both not shown in detail), in which a result of the calculation by the CSA2 150 
shown in Fig. 1 is temporarily stored, are initialized as '0'. 

b) When all data is input into each of the registers 102 and 104, the Booth r e cording 
conversion circuit 1 12 of the r e cording conversion logic 1 10 performs a Booth r e cording conversion 
function based on the two LSB bits in the register 102. The MUX 1 14 of the r e cording conversion 
logic 1 10 has as its input a value of B stored in the register 104 , and finds a Booth-converted result 
value Z i-n based on two LSB bits a j+i ^ and a; of the register 102 and a reference bit a u . A one's 
complementary operation is selectively performed on the output of the MUX 1 14 based on the sign 
bit zm\2] received from the one's complementer 116 in the Booth-converted result value. As a 
result, the one's complementer 1 16 and g e n e rates provides one of the values 0 T and_±B, ±2B , which 
is provid e d as one of three inputs to the CSA1 120 , and is based on th e two LSB bits in th e r e gist e r 
102. At this tim e , the on e 's compl e ment e r 1 16 of th e r e cording logic 1 10 chang e s on e of th e valu e s 
of 0, ±B, ±2B into it's on e 's compl e m e nt bas e d on th e two LSB bits in th e r e gi s t e r 102, and 
r e pr e s e nts th e on e 's compl e m e nt a s an n+ 4 bit numb e r, which is provid e d as on e to thr ee inputs of 
thoCSAl 120 ., 

c) The CSA1 120 performs an add operation for three input signed binary numbers of n+4 
bits. The CSA1 120 is composed of n+4 full adders 121 to 125. Carries generated in full adders of 
the CSA1 120 at a pr e viou s stag e are provided to the full adder oftheCSA2 150 at the next stage, 
while carries generated in the MSB full adder 125 are ignored. 

d) The quotient logic 130 has as its inputs output values Sij, Ci,o, and Si,o from the CSA1 
120, a Carry-in signal provided from the full adder 160, a sign bit B sign of the multiplicand B, and 
calculates and outputs Si and So by means of the full adder 134 and the exclusive OR logic 136. The 
carry signal value cin for correction is input to the full adder 134 and th e e xclusive OR logic 136 . 
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The carry signal- value cin for correction is provided as an input to the full adder 160 a signal for 
corr e cting a diff e r e nc e betw ee n th e e xisting Booth r e cording syst e m using two's compl e m e nt and 
th e modifi e d Booth r e cording system of the pr e s e nt inv e ntion using on e 's compl e m e nt , 

e) The combinational circuit 138 of the quotient logic 130 has as its input Si and So 
calculated in step d) and determines a value q of 3 bits by means of a truth table of Table 2. 
Although a detailed configuration of a circuit to determine the value of q by means of the truth table 
of Table 2 is not shown, it is apparent to those skilled in the art that a circuit for determining the 
value of q can be implemented by a general logic gate circuit. 

f) The CSA2 150 has as its inputs carry values and sum values obtained as outputs of the 
CSA1 120 in step c), and a signed binary number of n+4 bits of one selected from 0, ±N, and ±2N 
determined by two LSB bits of values of q obtained in step e) to perform an n+4 bit signed 
operation. TheCSA2 150 is composed of n+4 full adders 151 to 156. The LSB full adder 151 of the 
full adders 1 5 1 to 1 56 oftheCSA2 150 has, as its carry input of the least significant full adder 1 5 1, 
an MSB value qi? or a sign bit of the value of q calculated in step e) . and has as a sum bit a value So 
which is a sum output bit of the full adder 134 . 

g) The full adder 160 has as its inputs S2,i and C2,obits of output values of the CSA2 150 and 
bits of the carry signal cin for correction to output Carry-in bits through full adding of the inputs. 
This full adding operation is for correcting a difference between the existing Booth 
recordin g conversion system using two's complement and the modified Booth r e cordin g conversion 
system of the present invention using one's complement. 

h) (n+2) sum values and (n+3) carry values from the MSBs of the outputs of the CSA2 150 
are fedback to the CSA1 120 as its input. At this time, S2, n +3 being the MSB of a sum value which is 
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an output from the MSB full adder 156 of the CSA2 150 is copied and two bits are added thereto, 
and C2, n +3 being the MSB of a carry value which is an output from the MSB full adder 156 of the 
CSA2 150, are copied and one bit is added thereto. Results of such a copy and an addition for S2, n +3 
and C 2 , n +3 are input to the CSA1 120. The sum value S 2 , n +3 output from the full adder 156 of the 
CSA2 150 is provided to three full adders 123 to 125 of the CSA1 120, and the carry value C 2 , n +3 is 
provided to two full adders 124 and 125 of the CSA1 120. 

i) The following operation is performed after steps b) to h) are performed during (m+2) 
clocks. A carry propagation adder (CPA) (not shown) performs an addition operation for the carry 
value and the sum value, which are outputs of the CSA2 150. If a result value of the addition is a 
negative number, a modulus N is added thereto, but if the result value of the addition is a positive 
number, the modulus N is not added thereto. 

For example, if each of A, B and N has 12 bits as shown in the following Equation 3, a 
Montgomery modular operation result according to the above-described procedure is as shown in the 
following Table 3 and Table 4. 

Equation 3 

N=0000.1010.0101.1001 (0xA59) B=0000.0 10 1.1 100.0011 (0x5C3) 
N'=l 1 1 1.0101. 1010.01 10 B'=l 111.1010.001 1.1100 
2N=0001.0100.101 1.0010 2B*=1 111.0100.0111.1001 
A=0000. 1 00 1 .00 1 1 . 1 1 1 0 (0x93E) 



Table 3 
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Table 4 



I 


Ai 


SiSo 


c 


q2qi 


CSA2 out 

s 
c 


Carry-in 


I 


0 


00 


0 


000 


0000.0000.0000.0000 
0.0000.0000.0000.000 


0 


0 


-2 


10 


1 


010 


(ll).l 110.0000.1 100.1010 
(0)0.0010.1000.0110.000 




1 


0 


11 


0 


001 


(ll).l 110.1000.0101.0010 
(0)0.0010.0100.0101.001 




2 


0 


01 


0 


101 


(OO).OOOl.Ol 10.1000.1 110 
(1)1.1110.0010.0100.001 




3 


1 


11 


0 


001 


(ll).llll. 1001. 1010.1110 
(0)0.0001.0100.1010.001 




4 


1 


11 


0 


001 


(ll).l 11 1.1 110.0000.1 110 
(0)0.0001.0101.1010.001 




5 


-2 


10 


1 


010 


(1 1). 1 1 1 1.0000. 1 1 1 1 .0010 





27 



PATENT 

Attorney Docket: 678-1395 (PI 0801) 













(0)0.0001.1101.0010.010 




6 


1 


01 


0 


101 


(OO).OOOO.OOO 1.1 000.00 10 
(1)1.1111.1101.0110.111 


1 


7 


0 


00 


1 


000 


1111.1111.1011.1010 
0.0000.0000.0000.000 


1 



A procedure for calculating the modular multiplication A • BmodN using the result values of 
the operation by the apparatus of the present invention as described above will now be described. It 
should be noted that a hardware configuration for performing the procedure is apparent to those 
skilled in the art, and hence, detailed explanation thereof is omitted. The following calculations are 
performed: 

1) Calculate P =2 2(n+4) modN; 

2) Calculate C=A • B • 2" (n+4) modN; and 

3) Calculate PC- 2" (n+4) modN = A • BmodN. 

A procedure for calculating the modular exponentiation, m e modN, required to perform the 
RSA operation using the result values of the operation of the apparatus of the present invention as 
described above will now be described. The following operations are performed: 

1) Store an exponent e in a register (or a memory); 

2) Store a modulus N in the temporary register-G; 

3) Initialize the temporary registers C and S to '0'; 

4) Perform Montgomery modular multiplication, m'=f m (m,P,N)= m • P • R _1 modN, where P 
in the modular exponentiation is a pre-calculated value defined in step 1) for the modular 
multiplication operation the aforementioned procedure, and R=2 n+4 ; 

5) Load m' into the register B; 

6) Perform modular square operation using a value loaded into the register B, here, where 
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^ the multiplier A required for the Montgomery modular multiplication is loaded from the 
register B and its value is obtained by using the modified Booth r e cording conversion 
circuit; 

7) Shift the exponent e to the left; 

8) Ignore MSB 1 of the exponent e and perform subsequent steps 9) and 10) after the next 
bits; 

9) Perform steps 4) and 5) for the modular square operation regardless of a bit (0 or 1) of 
the exponent e, where, the multiplier and the multiplicand, which are required for the 
square operation, are stored in the register A and the register B, respectively; 

10) If the current bit of the exponent e is 1, perform step 4) and 5) for the modular 
multiplication after performing step 9), where, the multiplicand is the content of the 
register B and the multiplier is the base m' in the exponentiation; and 

1 1 ) Perform the modular multiplication once more using step 4) after performing steps 8) to 
10) for all bits of the exponent e, where, the multiplicand is the content of the register B 
and the multiplier is 1. 

If a result value of the performance of the CPA for values remaining in the registers C and S 
after performing the above steps 1) to 1 1) is a negative number, the modulus N is added thereto. 
Otherwise, if the result value is a positive number, it becomes a final value of the exponentiation, 
m e modN, with no addition of the modulus N. 

B-4. Effect of the Invention 

As apparent from the above description, the present invention provides a circuit for 
calculating A ■• B • 2" (n+4) modN, making the general modular multiplication A ■ BmodN possible by 
means of the circuit. A • BmodN calculated according to the present invention is applicable to 
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hardware apparatuses employable for devices in generating and verifying digital signatures. In 
addition, the present invention is applicable to hardware apparatuses for generating electronic 
signatures, authentication, and encryption/decryption based on IC card. In addition, the present 
invention can provide devices for encrypting and decrypting data or information by means of the 
electronic signature apparatus for performing the modular multiplication. Furthermore, the present 
invention can be used to implement existing public key cryptography systems such as NIST-DSS, 
RSA, ElGamal, and Schnorr electronic signatures, based on the electronic signature apparatus. 

C. Second Embodiment 

C-l. Configuration of the Invention 

Fig. 7 is a block diagram showing a configuration of a modular multiplication apparatus in 
accordance with the second embodiment of the present invention. 

Referring to Fig. 7, the modular multiplication apparatus includes r e cordin g conversion logic 
2 10, a first carry save adder (hereinafter, abbreviated as "CSA1") 220, quotient logic 230, a selector 
240, a second carry same adder (CSA2) 250, and an AND logic gate 260. The modular 
multiplication apparatus is a hardware device for calculating A ■ B • R _1 modN in m+2 ) clocks with 
A, B and N (where R=4 m+2 , m=n/2,-N<A, and B<N), each having n bits as its inputs, according to a 
modified Montgomery algorithm. Namely, the modular multiplication apparatus has a configuration 
for calculating A • B • 2" (n+4) modN 

Each of the CSAs 220 and 250 is composed of (n+4) full adders in parallel, each of which 
has a 3 bit input, and outputs a carry bit and a sum bit. The r e cordin g conversion logic 2 1 0 performs 
modified Booth r e cording conversion operation based on the multiplier A, and selects and outputs 
one of the values of 0, B, 2B, and 3B of (n+3) bits. The quotient logic 230 has as its inputs a least 
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significant bit (LSB) carry value Ci,o and two sum LSB bits Si,i and Si,o from the CSA1 220, a 
carry-in, and a sign bit of B, and outputs qiqo of 2 bits, which is a value for determining a multiple of 
the modular reduction. The selector 240, which can be implemented by multiplexers (MUXs), 
selects and outputs one of 0, N, 2N, and 3N based on a determined value of q. The AND logic 260 
performs an AND operation, with two bits and C2,o output from the CSA2 250 as its inputs, and 
provides a result value of the operation to the quotient logic 230 as a carry-in signal. 

Although not shown in detail in Fig. 7, it should be noted that the modular multiplication 
apparatus includes temporary storing registers C and R for storing carry values and sum values, 
which are the outputs form the CSA2 250, for each clock, and a carry propagation adder for adding 
values stored in the temporary storing registers C and R and outputting a resultant value as a result 
of the modular multiplication. 

Fig. 8 is a block diagram showing a detailed configuration of the r e cording conversion logic 
210 shown in Fig. 7. 

Referring to Fig. 8, the r e cording c onversion logic 2 1 0 Booth r e cords converts the two lesser 
bits of a bit string generated by sequentially shifting bits of the multiplier A, multiplexes a result of 
the Booth r e cording conversion with the multiplicand B, and outputs binary numbers of (n+3) bits. 
For this purpose, a shift register 202 for sequentially shifting bits of the multiplier to generate a 
shifted bit string and a register 204 for storing the multiplicand are provided at the front stage of the 
recording conversion logic 210. The recording conversion logic 210 also includes a multiplexer 
(MUX) 212. The multiplexer 212 multiplexes the two lesser bits ai+i and aj of the generated bit 
string with the multiplicand, and outputs 0, B, 2B and 3B as a result of multiplexing. The r e cording 
conversion logic 210, which is a circuit implementing a modified Booth r e cording conversion b ased 
on the multiplier A, selects and outputs one of the values of 0, B, 2B and 3B of (n+3) bits. 
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Fig. 9 is a block diagram showing a detailed configuration of the CSA1 shown in Fig. 7. 

Referring to Fig. 9, the CSA1 220 having (n+4) full adders 221 to 225 has as its inputs first 
signals S2,2 to S 2 , n +2 of (n+1) bits, second signals C2,i to C2, n +2 of (n+2) bits, and third signals B 0 to 
B n +2 being the binary numbers of (n+3) bits from the r e cording conversion logic 210, and full-adds 
the inputs by means of (n+3) full adders 221 to 225 to output carry values Ci.o to Ci, n +2 and sum 
values S i,o to Si, n +2 of (n+3) bits. The first and second signals are signals provided from the CSA2 
250 and the third signals are signals provided from the r e cording conversion logic 210. A most 
significant bit S2, n +2 of the first signals is input to the third-highest full adder 223 of the full adders, 
and a most significant bit C 2 , n +2 of the second signals is input to the second-highest full adder 224 of 
the full adders. A most significant bit full adder 225 of the full adders is provided with "0" as the 
first and second signals and the second-highest full adder 224 is provided with "0" as the first 
signals. Namely, the first signals S2,2 to S2,„+2 of (n+1) bits are sequentially input to a least 
significant bit full adder 221 and to a (n+l)th full adder 223 of the CSA1 220, respectively, and "0" 
is input as the first signal to a (n+2)th full adder 224 and a (n+3)th full adder 225. In addition, the 
second signals C 2 ,i to C2, n +2 of (n+2) bits are sequentially input to the least significant bit full adder 
221 and to the (n+2)th full adder 224 of the CSA1 220, respectively, and "0" is input as the second 
signal to a (n+3)th full adder 225. In addition, the third signals B 0 to B n +2 of (n+3) bits are 
sequentially input to the least significant bit full adder 221 and to the (n+l)th full adder 223 of the 
CSA1 220, respectively. 

Fig. 10 is a block diagram showing a detailed configuration of the quotient logic 230 shown 
in Fig. 7. 

Referring to Fig. 10, the quotient logic 230 has as its inputs sum values Si,o and Sij output 



32 



PATENT 

Attorney Docket: 678-1395 (PI 0801) 



from two lesser full adders and a carry value Ci,o output from a lesser full adder, which are selected 
from the carry values and sum values of (n+4) bits from the CSA1 120, and outputs a determination 
value qiqo of 2 bits to determine a multiple of the modular reduction. The quotient logic 230 
consists of D flip flop 232, a half adder (HA) 234, an exclusive OR (XOR) logic gate 236, and a 
combinational circuit 238. The D flip flop 232 temporarily stores a carry input value Carry-in input 
thereto from the AND logic 260. The half adder 234 half-adds the carry input value Carry-in stored 
in the D flip flop 232 and the sum value Si.o output from the least significant bit full adder 22 1 of the 
CSA1 220. The exclusive OR logic 236 performs an exclusive Or operation the carry value Ci,o 
output from the least significant bit full adder 221 of the CSA1 220 and the sum value Sij output 
from a second-lowest full adder 222. The combinational circuit 238 combines an output So from the 
half adder 234, an output Si from the exclusive OR logic 236, and a preset input bit nl to output the 
determination value qiqo of 2 bits. 

Fig. 1 1 is a block diagram showing a detailed configuration of the CSA2 shown in Fig. 7. 

Referring to Fig. 1 1, the CSA2 250 has (n+3) full adders 251 to 256. The CSA2 250 has 
modulus modulo numbers N (No - N n +2) of (n+3) bits selected from the selector 240 as first input 
signals, and remaining carry values Ci,o to Ci, n +2 of (n+2) bits, except a most significant bit carry 
value of the carry values of (n+3) bits, from the CSA1 220 as second input signals, and remaining 
sum values Si.i to Sj, n +2 of (n+2) bits except a least significant bit carry value of the sum values of 
(n+3) bits from the CSA1 220 as third input signals to output carry values C 2 ,o to C2, n +2 of (n+3) bits 
and sum values S2,o to S2, n +2 of (n+3) bits by means of the (n+3) full adders 25 1 to 256. The (n+3) 
bits of the first input signals are sequentially input, starting from a least significant bit full adder 251, 
to respective full adders 25 1 to 256, the (n+2) bits of the second input signals are sequentially input, 
starting from a second-lowest full adder 252, to respective full adders 252 to 256, and the (n+2) bits 
of the third input signals are sequentially input, starting from the second-lowest full adder 252, to 
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respective full adders 252 to 256. The least significant bit full adder 251 of the full adders 251 to 
256 is input with the output So from the half adder 234 of the quotient logic 230 and the carry input 
value Carry-in from the AND logic 260. 

Fig. 12 is a block diagram showing a detailed configuration of the AND logic shown in Fig. 

7. 

Referring to Fig. 12, the AND logic 260 full-adds a carry value C2,o output from the least 
significant bit full adder 251 of the CSA2 250 and a sum value S2,i output from the second-lowest 
full adder 252 to output the carry input value Carry-in. The carry input value Carry-in is provided to 
the quotient logic 230. 

C-2. Principle of the Invention 

The present invention provides a device for calculating A ■ B • R _l modN in m+2 clocks with 
A, B and N (where R=4 m+2 , m=n/2,-N<A, and B<N), each having n bits as its inputs. Two principles 
that are applicable to implementation of the present invention will now be described. The two 
principles include a first principle of representation of the multiplier A and the multiplicand B for 
modular multiplication and a second principle of the Montgomery algorithm using a principle of 
r e cording conversion of the present invention. 

C-2.a 2bit Scanning 

In the present invention, the multiplier A is scanned (or shifted) by two bits from the LSB for 
each clock and is then multiplied with the multiplicand B, and a result of the multiplication is used 
for the Montgomery algorithm. Therefore, ai generated in each loop, which is one of elements {0, 1, 
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2, 3}, is multiplied with the multiplicand B, and a result of the multiplication is input to the CSA1 
220. 

C-2.b. Radix-4 Montgomery Algorithm 

The following algorithm illustrated in Equation 4 shows that the present invention employs 
radix-4 Montgomery modular multiplication. An original Montgomery algorithm compares a result 
value with a modulus N, and performs a subtraction operation if the result value is greater than the 
modulus N. However, the following algorithm of the present invention does not show such a 
comparison and subtraction operation of the original Montgomery algorithm. 



Equation 4 

Input: N, -N<A,B<N 

Output: S = A • B • 4' m - 2 modN, 0<S<N 



S = 0 



(1) 



fori = 0to (n+l)/2 



(2) 



S = S + Aj x B 



(3) 



qi(i,o) = f(si,so,ni,no) 



(4) 



S = S + qi xN 
S = S/2 2 



(5) 



(6) 



endfor 



(7) 
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In the algorithm of Equation 4, Aj in procedure (3) relates to two scanned bits. Procedure (4) 
relates to a function to cause the two least significant bits of the result values in procedure (5) to be 
'0\ The result values in procedure (4) depend on input bits si, so, ni, and no, and, for the 
Montgomery modular multiplication, is actually determined as shown in the following Table 5 since 
N is an odd number and no is always 1 . A value q\ used for modular reduction is one of the elements 
of {0, 1, 2, 3} and is calculated according to the following Equation 5. 

Equation 5 

qo = so 

q x =s 0 s l n l ^rs 0 s l +s l n l 



Table 5 
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C-3. Operation of the Invention 

The apparatus of the present invention as shown in Fig. 7 calculates A • B • R^modN in m+2 
clocks with A, B and N (where R=4 m+2 , m=n/2, -N<A, and B<N), each having n bits, as its inputs. 

A procedure for calculating A • B • R^modN (where R=4 m+2 ) by the apparatus shown in Fig. 
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7 will now be described. In the following description, step a) is an initialization step, steps b) to h) 
are steps to be performed every clock, and step i) is a step to be performed after the steps b) to h) are 
performed during (m+2) clocks. 

a) A, B, and N, each consisting of n bits, input for modular multiplication, are stored in 
respective registers (or memories). In addition, 2B and 3B of n+2 bits are stored in respective 
registers (or memories). Although the apparatus of the present invention is shown to store the inputs 
A and B in respective registers 202 and 204 without showing separate registers in which 2B and 3B 
are respectively stored, it is apparent to those skilled in the art that such separate registers are used in 
the apparatus of the present invention. The register 202 in which A is stored, is a shift register in 
which A is shifted to the right side by two bits for each clock. The register in which A is stored is 
indicated as register A and the register in which B is stored is indicated as register B. In the case of 
the memory, A and B are read one word at a time. Temporary registers (or memories) C and S (both 
not shown in detail), in which a result of calculation by the CSA2 250 shown in Fig. 7, is 
temporarily stored are initialized as '0'. 

b) When all data is input to each of the registers 202 and 204, the r e cording conversion logic 
210 performs a Booth r e cording conversion function based on the two LSB bits in the register A 
202. The MUX 212 of the r e cording conversion logic 210 has as its input a value stored in the 
register B 204 and selects one of the values of 0, B, 2B, 3B, which is provided as one of three inputs 
of the CSA1 220, based on the two LSB bits in the register A 202. 

c) The CSA1 220 performs an add operation for three input binary numbers of n+3 bits. The 
CSA1 220 is composed of n+3 full adders 121 to 125. 

d) The quotient logic 230 has as its inputs output values Si f i, Ci,o, and Si,o of the CSA1 220 
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and a Carry-in signal provided from the AND logic 260, and calculates and outputs Si and So by 
means of the half adder 234 and the exclusive OR logic 236. 

e) The combinational circuit 238 of the quotient logic 230 has as its inputs Si and So 
calculated in step d) and determines a value q of 2 bits by means of a truth table of Table 5. 
Although a detailed configuration of a circuit to determine the value of q by means of the truth table 
of Table 5 is not shown, it is apparent to those skilled in the art that a circuit for determining the 
value of q can be implemented by a general logic gate circuit. 

f) The CSA2 250 has as its inputs carry values and sum values obtained as outputs of the 
CSA1 220 in step c), and a binary number of n+3 bits of one selected from 0, N, 2N and 3N 
determined by the two LSB bits of values of q obtained in step e) to perform an n+3 bit non-signed 
operation. The CSA2 250 is composed of n+3 full adders 251 to 256 like the CSA1 220. It should 
be noted that the LSB full adder 25 1 of the full adders 25 1 to 256 has as its carry input the Carry-in 
signal generated in a previous stage. 

g) The AND logic 260 has as its inputs S 2 ,i and C 2 , 0 bits of output values of the CSA2 250 to 
output Carry-in bits through an AND operation on the inputs. 

h) (n+2) sum values and (n+3) carry values from MSBs of the outputs of the CSA2 250 are 
fedback to the CSA2 220 as its input. Two higher bits of the sum values and one higher bit of the 
carry values are "0" and two bits are shifted to the right side in the CSA2 250 for the feedback to the 
CSA1 220. The sum value S 2>n + 2 output from the full adder 256 of the CSA2 250 is provided to the 
third-highest full adder 223 of the CSA1 220, and the sum value of "0" is provided to the MSB full 
adder 225 and the second-highest full adder 224. The carry value C 2 , n + 2 output from the full adder 
256 of the CSA2 250 is provided to the second-highest full adders 224 of the CSA1 220 and the 
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carry value of "0" is provided to the MSB full adder 225. 

i) The following operation is performed after steps b) to h) are performed during (m+2) 
clocks. A carry propagation adder (CPA) (not shown) performs addition an operation for the carry 
value and the sum value, which are outputs of the CSA2 250. 

For example, if each of A, B and N has 12 bits as shown in the following Equation 6, a 
Montgomery modular operation result according to the above-described procedure is as shown in the 
following Table 6 and Table 7. At this time, a final result of operation is as follows: 
FinalResult:01 1 1.1 100.01 1 1(0x7C7)+0010.1000.0000(0x280)+1=1010.0100.1000(0xA48) 

Equation 6 

N=000.1010.0101.1001 (0xA59) B=000.0 10 1.1 100.0011 (0x5C3) 
2N=001. 0100. 101 1.0010 (0xl3B2) 2B=000.101 1.1000.01 10 (0xB86) 
3N=001.1 11 1.0000.1011 (OxlFOB) 3B=001.0001.0100.1001 (0x1149) 
A=000. 100 1.001 1.1 1 10 (0x93E) 

Table 6 
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0000.0011.0000.011 
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Table 7 
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A procedure for calculating the modular multiplication A • BmodN using result values of the 
operation by the apparatus of the present invention as described above will be described as follows. 
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It should be noted that a hardware configuration for performing the procedure is apparent to those 
skilled in the art, and hence, a detailed explanation thereof is omitted. The following calculations 
are performed 

1) Calculate P =2 2(n+4) modN; 

2) Calculate C=A • B • 2' (n+4) modN; and 

3) Calculate P • C ■ 2 _(n+4) modN = A • BmodN. 

Next, a procedure for calculating the modular exponentiation, m e modN, required to perform 
the RSA operation using the result values of the operation of the apparatus of the present invention 
as described above will be described as follows. The following procedure occurs: 

1) Store an exponent e in a register (or a memory); 

2) Store a modulus N in the temporary register C; 

3) Initialize the temporary registers C and S to '0'; 

4) Perform Montgomery modular multiplication, m'=f m (m,P,N)= m • P • R^modN, where, a 
P in the modular exponentiation is a pre-calculated value defined in the aforementioned 
procedure, and R=2 n+4 ; 

5) Load m' into the register B; 

6) Perform modular square operation using a value loaded into the register B, where, the 
multiplier A required for the Montgomery modular multiplication is loaded from the 
register B and its value is obtained by using the radix-4 recording conversion circuit: 

7) Shift the exponent e to the left; 

8) Ignore MSB 1 of the exponent e and perform subsequent steps 9) and 10) after next bits; 

9) Perform steps 4) and 5) for the modular square operation regardless of a bit (0 or 1) of 
the exponent e, the multiplier and the multiplicand, which are required for the square 
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operation, are stored in the register A and the register B 5 respectively; 

10) If the current bit of the exponent e is 1, perform steps 4) and 5) for the modular 
multiplication after performing step 9), at this time, the multiplicand is the content of the 
register B and the multiplier is the base m' in the exponentiation; and 

1 1 ) Perform the modular multiplication once more using step 4) after performing steps 8) to 
10) for all bits of the exponent e, where the multiplicand is the content of the register B 
and the multiplier is 1. 

The result value of the performance of the CPA for values remaining in the registers C and S 
after performing the above steps 1) to 1 1) becomes a final value of the exponentiation, m e modN. 

C-4. Effect of the Invention 

As apparent from the above description, the present invention provides a circuit for 
calculating A • B • 2" (n+4) modN, making the general modular multiplication A • BmodN possible by 
means of the circuit. A • BmodN calculated according to the present invention is applicable in 
hardware apparatuses employable in devices for generating and verifying digital signatures. In 
addition, the present invention is applicable to hardware apparatuses for defining electronic 
signatures, authentication and encryption/decryption based on IC cards. In addition, the present 
invention can provide devices for encrypting and decrypting data or information by means of an 
electronic signature apparatus for performing the modular multiplication. Furthermore, the present 
invention can be used to implement existing public key cryptography systems such as NIST-DSS, 
RSA, ElGamal, and Schnorr electronic signatures, based on the electronic signature apparatus. 

D. Example of Application of the Invention. 
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Fig. 13 is a block diagram of an IC card, which is capable of performing encryption and 
electronic signature by using the Montgomery type modular multiplication apparatus disclosed in the 
present application. 

In Fig. 13, a central processing unit (CPU) 310 decodes instructions to perform an 
encryption, authentication and electronic signature, and provides control signals and data 
required for a modular calculation to coprocessor 330. A read only memory (ROM) 350 
contains a security module for securing data, for example, a key required for encryption and 
electronic signature. Control logic 320 and random access memory (RAM) 340 are also shown, 
and provide their logic and memory to perform the above operations. 

Although the preferred embodiments of the present invention have been disclosed for 
illustrative purposes, those skilled in the art will appreciate that various modifications, additions 
and substitutions are possible, without departing from the scope and spirit of the invention as 
disclosed in the accompanying claims. 
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AMENDMENTS TO THE CLAIMS 

1. (Currently Amended) A signal processing apparatus for performing modular 
multiplication for use in a signal processing system, the apparatus comprising: 

a first logic for outputting a signed multiplicand by selectively performing a one's 

complementary operation on a multiplicand according to a Booth conversion result of a multiplier in 
modular multiplication: 

a second logic for outputting a modulus which is signed in the modular multiplication based 

on a carry input value Carry-in of a current clock, determined from a carry value cin for correction 
of a previous clock, and on a sign bit of the multiplicand: and 

a third logic for receiving the signed multiplicand and the signed modulus, and calculating a 
result value of the modular multiplication by iterativelv performing a full addition operation on a 
carry value C and a sum value S of the full addition operation, found at the previous clock. 

A modular multiplication device for implementing an information e ncryption/decryption 
techniqu e in which m e ssag e (A) is e ncrypt e d/d e crypt e d using a first k e y (B) and a s e cond k e y (N), 
comprising: 

a s torag e devic e for storing th e m e ssag e , th e first key, and th e s e cond k e y, e ach b e ing n bits 

in length; 

a r e cording logic for g e n e rating at e ach clock a first n+ 4 bit signal using th e m e ssag e and th e 

first k e y; 

a first carry sav e add e r for g e n e rating a 3bit s e qu e nce consisting of on e carry valu e and two 

sum valu e s using th e fir s t n+ 4 bit signal and two parall e l n-Hbit input s ignals; 

a quoti e nt logic for g e n e rating a 3bit d e t e rmin e r for d e t e rmining a modular r e duction 

multipl e using th e 3bit se qu e nc e and on e carry valu e ; 

a s e l e ctor for g e n e rating a s e cond n+4bit signal using th e se cond k e y and th e 3bit d e t e rmin e r; 

a s e cond carry sav e add e r for g e n e rating a pair of sum valu e s and a pair of carry valu e s using 

th e second n+4 bit signal, and r e sp e ctiv e s um and carry t e rm s outputt e d from th e first carry add e r; 
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Ullvi 

a first full add e r for g e n e rating a carry input valu e by p e rforming a full addition operation 

with th e pair of s um and carry valu es and a pr e s e t circuitry carry valu e (oin) outputt e d from th e 
quoti e nt logic at a previous clock. 

2. (Cancelled) 

3 . (Currently Amended) The devise apparatus of claim 1 , wherein th e m e ssag e is right shift e d 
by 2 bit positions at e v e ry clock the first logic receives two least significant bits of the multiplier and 
a predetermined reference bit while sequentially shifting bits of the multiplier, and performs the 
Booth conversion thereon . 

4. (Cancelled) 

5. (Cancelled) 

6. (Currently Amended) The deviee -apparatus o f claim [[1]] 3, wherein th e r e cording logic 
compris e s: 

a booth r e cording circuit for p e rforming booth r e coding with th e two l e ast significant bits 

of th e m e ssag e ; 

a multiplex e r for multiplexing the two l e ast significant bits and the first k e y, and 

outputting on e of 0, B, and 2B; and 

a on e 's compl e m e ntary op e rator for p e rforming a one's complem e ntary operation on th e 

n+lbit signal outputt e d from th e multipl e xer according to th e two l e ast significant bits and 
generating one of 0, B. 2B. B. and 2B the first logic comprises: 
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a Booth conversion circuit for performing the Booth conversion using the two least 

significant bits of the multiplier and the reference bit; 

a multiplexer for multiplexing the multiplicand based on the two least significant bits of the 

multiplier; and 

a one's complementer for outputting the signed multiplicand by selectively performing the 

one's complementary operation on the output of the multiplexer based on a sign bit of the Booth 
conversion result . 

7. (Currently Amended) The deviee -apparatus of claim 1, wherein the first carry save add e r 
compris e s n+ 4 se cond full add e rs, e ach p e rforming a full addition operation with corr e sponding sum 
and carry bits of th e two parall e l n+ 4 input signals and corr e sponding bit of th e first n+ 4 bit signal, 
and producing th e 3bit s e qu e nc e s the third logic performs the full addition operation using at least 
two Carry Save Adders (CSAs) each including a plurality of full adders , 

8. (Cancelled) 

9. (Cancelled) 

10. (Cancelled) 

11. (Cancelled) 

12. (Currently Amended) The deviee- apparatus of claim 1 , wherein the quotient second logic 
comprises: 

a D flip flop for temporally storing th e carry input valu e from the first full add e r; 
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a third full add e r for p e rforming a full addition op e ration on th e carry input valu e , a sum 

valu e outputt e d from a l e ast significant full add e r of th e first carry sav e add e r, and a sign bit of th e 
first n+ 4 bit signal in; 

an e xclusiv e OR (XOR) logic gat e for p e rforming on e xclusive OR op e ration on th e carry 

valu e outputt e d from th e l e ast significant full add e r of th e first carry sav e add e r, a sum value 
outputt e d from a s e condly l e ast significant full add e r of th e first carry sav e adder, and th e carry valu e 
of th e third full add e r; and 

a combinational circuit for combining th e outputs of th e third full adder , e xclusiv e OR logic 

gat e and a s e cond l e ast significant bit of th e s e cond k e y, and outputting th e 3bit d e t e rminer signal a 
quotient logic for determining at every clock first bit values which are extracted by as many values as 
a predetermined number of bits, beginning from a least significant bit for each of the carry value and 
the sum value calculated in the third logic, and second bit values for determining a multiple of 
modular reduction in the modular multiplication based on the carry input value Carry-in and a sign 
bit of the multiplicand; and 

a selector for selecting the signed modulus based on the second bit values . 

13. (Cancelled) 

14. (Currently Amended) The deviee-a pparatus of claim 1, wherein the first full add e r 
p e rforms a full addition with a sum valu e output from a s e cond least significant full add e r of the 
s e cond carry sav e add e r, a carry valu e outputt e d from a l e ast significant full add e r of th e s e cond carry 
sav e add e r, and a carry valu e (oin) outputt e d from the quoti e nt logic at a previous clock, and 
produces the carry input valu e t hird logic further comprises a full adder for outputting the carry input 
value Carry-in by performing the full addition operation using the carry value cin for correction and 
the sign bit of the multiplicand, received from the second logic . 
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15. (Currently Amended) The device apparatus of claim 1 furth e r comprising a cony 
propagation add e r for performing a carry propagation addition op e ration with th e sum and carry 
termr . nutputted from th e r .e cond carry save add e r aft e r m+2 clock, whor e m~n/2. wherein the third 
logic performs a carry propagation addition operation on the carry value and the sum value output 
from the third logic after (m+2) clocks, where m=n/2. when each of the multiplier, the multiplicand 
and the modulus has n bits . 

16. (Currently Amended) The deviee -apparatus of claim 15, wherein the carry propagation 
adder adds modulus s e cond k e y to a r e sult of th e carry propagation addition operation if an output of 
the carry propagation add e r is n e gativ e valu e the third logic adds the modulus to the carry propagation 
addition operation result when a result of the carry propagation addition operation is a negative 
number . 

17. (Cancelled) 

18. (Cancelled) 

19. (Cancelled) 

20. (Cancelled) 

21. (Cancelled) 

22. (Cancelled) 

23. (Cancelled) 
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24. (Cancelled) 

25. (Cancelled) 

26. (Cancelled) 

27. (Cancelled) 

28. (Cancelled) 

29. (Cancelled) 

30. (Cancelled) 

31. (Cancelled) 

32. (Currently Amended) A modular multiplication mothod for implomonting a messag e 
encryption/d e cryption toohniqu e in which a messag e (A) is onoryptod/decryptod using a first k e y (B) 
and a s e cond key (N), comprising: 

storing th e m e ssag e , first k e y, and s e oond koy oaoh of n bits ; 

g e nerating a first n+ 4 bit signal using th e m e ssag e and first k e y at e ach clock; 

outputting a 3bit s e qu e nc e consisting of on e carry valu e and two sum valu e s by 

performing a first carry sav e addition op e ration with th e first n+ 4 bit signal and two parall e l n+ 4 
bit input signals; 

gen e rating a 3 bit d e t e rmin e r for d e t e rmining a modular roduotion multipl e by p e rforming 
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a quoti e nt op e ration with th e 3bit s e qu e nce and on e input carry valu e ; 

g e n e rating a s e cond n+ 4 bit signal using the s e cond key and the 3bit determin e r; 

outputting a pair of sum valu e s and a pair of carry valu e s by p e rforming a s e cond carry 

save addition operation with the second n+ 4 bit signal, and respective sum and carry terms output 
from th e first carry save addition op e ration; 

outputting a carry input valu e by p e rforming a full addition op e ration with the pair of sum 

and carry valu e s and a carry valu e outputted from the quoti e nt logic at a pr e vious clock. 

A signal processing method for performing modular multiplication for use in a signal 
processing system, the method comprising: 

outputting a signed multiplicand by selectively performing a one's complementary operation 

on a multiplicand according to a Booth conversion result of a multiplier in modular multiplication; 

finding a carry input value Carry-in of a current clock determined from a carry value cin for 

correction of a previous clock; 

outputting a modulus which is signed in the modular multiplication based on the carry input 

value and a sign bit of the multiplicand; and 

receiving the signed multiplicand and the signed modulus, and calculating a result value 

of the modular multiplication by iteratively performing a full addition operation on a carry value 
C and a sum value S of the full addition operation, found at the previous clock. 

33. (Currently Amended) The method of claim 32, wherein th e m e ssag e is right shift e d 
by 2 bits at ev e ry cloc k outputting a signed multiplicand comprises: 

receiving two least significant bits of the multiplier and a predetermined reference bit 

while sequentially shifting bits of the multiplier, and performing the Booth conversion thereon . 
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34. (Currently Amended) The method of claim 33 [[32]], wherein gonorating tho first 
n+ 4 bit signal includ e s the outputting of a signed multiplicand comprises : 

p e rforming booth r e cording with th e two l e ast significant bits of tho messag e ; and 

gen e rating, one of 0, B, 2B, B, and 2B according to tho two loaot significant 

bit sperforming the Booth conversion using the two least significant bits of the multiplier and the 
reference bit: 

multiplexing the multiplicand based on the two least significant bits of the multiplier: and 

outputting the signed multiplicand by selectively performing the one's complementary 

operation on the output of the multiplexed multiplicand based on a sign bit of the Booth 

conversion result . 

35. (Cancelled) 

36. (Cancelled) 

37. (Cancelled) 

38. (Cancelled) 

39. (Cancelled) 

40. (Cancelled) 
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41. (Cancelled) 

42. (Currently Amended) The method of claim 33 [[32]], wherein the ono input carry 
valu e is th e carry input valu e g e n e rat e d by th e full addition operation finding of a carry input 
value Carry-in comprises: 

outputting the carry input value Carry-in by performing a full addition operation using the 

carry value cin for correction and the sign bit of the multiplicand . 

43. (Cancelled) 

44. (Cancelled) 

45. (Cancelled) 

46. (Currently Amended) The method of claim 32, further comprising performing a carry 
propagation addition operation with th e s um and carry terms on the carry value and the sum 
value after (m+2) clocks, where m=n/2 , when each of the multiplier, the multiplicand and the 
modulus has n bits . 

47. (Currently Amended) The method of claim 46, further comprising adding a-the modulus 
seeead k e y if an output of th e carry propagation addition op e ration is a negativ e valu e to the carry 
propagation addition operation result, when a result of the carry propagation addition operation is a 
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negative value . 



48. (Cancelled) 

49. (Cancelled) 

50. (Cancelled) 

51. (Cancelled) 

52. (Cancelled) 

53. (Cancelled) 

54. (Cancelled) 

55. (Cancelled) 

56. (Cancelled) 

57. (Cancelled) 

58. (Cancelled) 

59. (Cancelled) 

60. (Cancelled) 

61. (Cancelled) 

62. (Cancelled) 

63. (Cancelled) 

64. (New) The apparatus of claim 6, wherein the first logic outputs the signed multiplicand 
by selectively performing the one's complementary operation on the output of the multiplexed 
multiplicand based on a sign bit of the Booth conversion result. 
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65. (New) The method of claim 32, wherein the full addition operation is performed using at 
least two Carry Save Adders (CS As) each including a plurality of full adders. 

66. (New) The method of claim 32, wherein the outputting a signed modulus comprises: 
extracting, at every clock, as many first bit values as a predetermined number of bits 

beginning from a least significant bit for each of the carry value and the sum value; 

outputting second bit values for determining a multiple of modular reduction in the modular 
multiplication based on the first bit values, the carry input value Carry-in and a sign bit of the 
multiplicand; and 

selecting the signed modulus based on the second bit values. 

67. (New) The method of claim 32, wherein the outputting a signed multiplicand 
comprises: 

outputting the signed multiplicand by selectively performing the one's complementary 
operation on the output of the multiplexed multiplicand based on a sign bit of the Booth 
conversion result. 
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AMENDMENTS TO THE ABSTRACT 

Disclos e d is a modular multiplication apparatus for high sp ee d e ncryption/d e cryption and 
e l e ctronic signatur e in a mobil e communication e nvironm e nt including smart cards and mobil e 

t e rminals. A signal processing apparatus for performing modular multiplication for use in a 
signal processing system includes a first logic for outputting a signed multiplicand by selectively 
performing a one's complementary operation on a multiplicand according to a Booth conversion 
result of a multiplier in modular multiplication; a second logic for outputting a modulus which is 
signed in the modular multiplication based on a carry input value Carry-in of a current clock, 
determined from a carry value cin for correction of a previous clock, and on a sign bit of the 
multiplicand: and a third logic for receiving the signed multiplicand and the signed modulus, and 
calculating a result value of the modular multiplication by iterativelv performing a full addition 

operation on a carry value C and a sum value S of the full addition operation, found at the 
previous clock. The pr e sent inv e ntion provid e s an apparatus for p e rforming Montgomery type 

modular multiplication for calculating A -^ B ^- R^modN (wh e r e R"4"^) in m+2 (whore m-n/2) 

clocks with the multiplier A and the multiplicand B, e ach having n bits as its inputs, wh e rein bits 

of the multipli e r are s e qu e ntially shift e d to generat e a shift e d bit string and th e two l e ast 
significant bits of th e generated bit string are Booth recorded. The present invention provides a 
high-speed modular multiplication apparatus with fewer gates and reduced power consumption. 
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REMARKS 

The Examiner has stated that the claims of the present application contain patentably distinct 
species and is requiring restriction under 35 U.S.C. §121 as follows: 

Species I. Fig. 1, Claims 1-16, 23-27, 32-47; and 
Species II. Fig. 7, Claims 17-22, 28-31, 48-63. 

Prior to entry of this amendment, Claims 1-63 are pending in this application and were 
the subject of an Election of Species Requirement. In response to the Office Action, Applicants 
respectfully elect, without traverse, Species I, which in view of the above amendment, reads 
Claims 1, 3, 6, 7, 12, 14-16, 32-34, 42, 46, 47 and 64-67 for examination on the merits. The 
Specification, Claims 1, 3, 6, 7, 12, 14-16, 32-34, 42, 46 and 47, and the Abstract have been 
amended, and New Claims 64-67 have been added to the application. Please cancel Claims 2, 4, 
5, 8-11, 13, 17-31, 35-41, 43-45, and 48-63 without prejudice. It is respectfully submitted that 
no new subject matter has been added by these amendments. Applicants reserve the right to file 
divisional applications to the non-elected groups of claims. 

A clean copy of the specification, claims and abstract is enclosed for the Examiner's 
convenience. 
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Should the Examiner believe that a telephone conference or personal interview would 
facilitate resolution of any remaining matters, the Examiner may contact Applicants' attorney at 
the number given below. An early and favorable action is earnestly solicited. 



The Farrell Law Firm 

333 Earle Ovington Boulevard, Suite 701 

Uniondale, New York 1 1553 

(516) 228-3565 -Tel 

(516) 228-8475 -Fax 



Respectfully submitted, 




Paul J. Farrell 
Reg. No. 33,494 
Attorney for Applicant 
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